This readme file is a companion to the files “Study1.csv”, “Study2.csv”, “Study3.csv”, and “Study4.csv”, which contain the data analyzed for the four studies in the paper “Extracting the Wisdom of Crowds When Information is Shared” by Asa B. Palley and Jack B. Soll. The corresponding R script analyzing the data is called “Studies1234.R”. Below we describe the variables in the data file for each study:

Study 1 Variables. Note Study 1 responses are stored in a “wide” format, so that responses for all 8 coins for a participant are listed in a single row in the csv file.

SubjectID: unique participant identifier

Condition: keeps track of the information setting, with Condition=1 for Symmetric; Condition=2 for Nested-Symmetric; Condition=3 for Nested	

minutes: number of minutes elapsed from start to finish of the study for the participant

chosen: a unique code (assigned randomly) identifying the specific random sequences of coin flips for each of the 8 coins (which were generated randomly according to the true coin biases before the start of the experiment).

subCondition: In the Symmetric condition, keeps track of the weighting between public and private signals subCondition=1 for 9 shared, 3 private signals, w=0.25; subCondition=2 for 9 shared, 9 private signals, w=0.5; subCondition=3 for 3 shared, 9 private signals, w=0.75. In the Nested-Symmetric condition, keeps track of the proportion of mavens subCondition=1 for p=0.25; subCondition=2 for p=0.5; subCondition=3 for p=0.75. In the Nested condition, keeps track of the proportion of experts subCondition=1 for p=0.25; subCondition=2 for p=0.5; subCondition=3 for p=0.75.

weight: the relative information weight w on private versus shared signals for those 8 coins

proportion: the probability that a participant in the crowd was assigned to the expert or maven role

numShared: the number of total flips observed in the shared signal for those 8 coins

numPrivate: the number of total flips observed in the private signal for those 8 coins

C1SharedSum through C8SharedSum: the number of heads observed in the shared signal for for coins 1 through 8

C1PrivateSum through C8PrivateSum:  the number of heads observed in the private signal for coins 1 through 8 (only relevant for experts and mavens)

theta1 through theta8: true bias of coins 1 through 8

pos1 through pos8: order in which coins 1 through 8 were displayed to that participant

type: role assigned to the participant for all 8 coins. “N” means layperson and type="S" means "expert" or "maven" in N and NS, respectively.

Quiz1: Participants were asked “Suppose that a biased coin has an 80% chance of coming up heads. How many heads would you expect to see in 100 new flips of the coin?” The possible answers are coded as 1 = “5”, 2 = “10”, 3 = “20”, 4 = “50”, 5 = “80”. (The correct answer was 5).

Quiz2: Participants were asked “Suppose that a coin has a 40% chance of coming up heads. If you flip the coin 100 times, which statement is true?” The possible answers are coded as 1 = “The chances of getting 91 heads is exactly the same as the chances of getting 43 heads”, 2 = “Any number of heads is technically possible, but the chances are good that I will get between 30 and 50 heads”, 3 = “It is impossible to get fewer than 10 heads”, 4 = “There is a 50% chance that the coin will come up tails”, 5 = “I will definitely see exactly 40 heads”. (The correct answer was 2).

age: participant’s stated age

female: indicator variable for whether participant stated their gender as female
	
device: Participants were asked “What type of device are you using to take this survey?” The possible answers are coded as 1 = “Desktop computer”, 2 = “Laptop”, 3 = “Tablet”, 4 = “Smartphone”, 5 = “Other (please indicate)”

f1 through f8: participant’s own forecast for coins 1 through 8

g1 through g8: participant’s guess of others’ average forecast for coins 1 through 8
 

################################################################


Study 2 Variables. Note Study 2 responses are stored in a “long” format, so that responses for each of the 8 coins for a participant are listed in a separate rows in the csv file.

SubjectID: unique participant identifier

minutes: number of minutes elapsed from start to finish of the study for the participant

chosen: a unique code (assigned randomly) identifying the specific random sequences of coin flips for each of the 8 coins (which were generated randomly according to the true coin biases before the start of the experiment).

numShared: the number of total flips observed in the shared signal for that coin

numPrivate: the number of total flips observed in the private signal for that coin

incentives: variable indicating payment incentives for responses 

guessothers: variable indicating whether or not participant guessed the average forecast of others

Condition: keeps track of the information setting, with Condition=1 for Forecast Only with Accuracy incentives, Condition=2 for Forecast Only with Competitive incentives, Condition=3 for Guessing Others with Accuracy incentives, Condition=4 for Guessing Others with Competitive incentives

subCondition: keeps track of the weighting between public and private signals, with subCondition=1 for 9 shared, 3 private signals, w=0.25; subCondition=2 for 9 shared, 9 private signals, w=0.5; subCondition=3 for 3 shared, 9 private signals, w=0.75

weight	: the relative information weight w on private versus shared signals for that coin

Coin: coin number (1 through 8, each participant gave responses for 8 different coins)

SharedSum: the number of heads observed in the shared signal for that coin

PrivateSum:  the number of heads observed in the private signal for that coin

theta: true bias of the coin

f: participant’s own forecast for that coin

g: participant’s guess of others’ average forecast for that coin

bonusQinitial2tries: indicator for whether the participant on their first attempt gave the incorrect answer to the following quiz question checking whether they understood the bonus payment scheme for their treatment, but answered it correctly on their second attempt. The quiz text was “Before we begin, we have a question to make sure you understand how the bonus payment will be determined. In this study, we will randomly choose one of the coins, and flip it 100 times. Your bonus payment will be based on:” with possible answers of "How close your forecast of the number of Heads is to the number of Heads in 100 new flips of the selected coin.", "How close your guess about others is to the number of Heads in 100 new flips of the selected coin.", "You will receive a bonus if your forecast of the number of Heads differs by more than 10 units from the number of Heads in 100 new flips of the selected coin.", "How close your forecast of the number of Heads is to the number of Heads in 100 new flips of the selected coin and how close your guess about others is to the average forecast that other participants give for the selected coin.", "Only the most accurate forecaster will receive a bonus. To receive payment, your forecast of the number of Heads needs to be closer to the actual number of Heads in 100 new flips of the selected coin than all of the other participants' forecasts."

failBonusQfinalcheck: indicator for whether the participant gave the incorrect answer to the same quiz checking whether they understood the bonus payment scheme for their treatment (detailed above) at the end of the survey 

age: participant’s stated age

female: indicator variable for whether participant stated their gender as female
	
H_always: Participants were asked “What type of device are you using to take this survey?” The possible answers are coded as 1 = “Desktop computer”, 2 = “Laptop”, 3 = “Tablet”, 4 = “Smartphone”, 5 = “Other (please indicate)”

H_always_TEXT: Optional text response to explain answer to H_always question above

device: Participants were asked “What type of device are you using to take this survey?” The possible answers are coded as 1 = “Desktop computer”, 2 = “Laptop”, 3 = “Tablet”, 4 = “Smartphone”, 5 = “Other (please indicate)”

comments: Any comments participant provided at the end of the study



################################################################



Study 3 Variables. Note Study 3 responses are stored in a “wide” format, so that responses for all 10 bundles for a participant are listed in a single row in the csv file.

subject: unique participant identifier

f1 through f10: participant’s own judgment for grocery bundles 1 through 10

g1 through g10: participant’s guess of others’ average judgment for grocery bundles 1 through 10



################################################################



Study 4 Variables. Note Study 4 responses are stored in a “long” format, so that responses for each of the 8 or 16 game for a participant are listed in a separate rows in the csv file. The March Madness data set is composed of data from each of the years 2014, 2015, 2016. In each year, there were three sets of games that were asked about in separate subject pools. Set 1 was 16 games for the South and East regions from the round of 64, Set 2 was the other 16 games from the West and Midwest regions in the round of 64, and Set 3 was 8 games from the round of 16 (for a total of 40 different games). Quiz questions about interest in and knowledge of NCAA basketball are denoted SK1 through SK4 and OK1 and OK4.

subject: unique participant identifier

minutes: number of minutes elapsed from start to finish of the study for the participant

year: NCAA basketball tournament year

gameset: identifier for the subset of NCAA basketball games participant was asked to forecast for that year

gamenumber: game number (1 though 8 or 1 through 16) within the gameset

teamchosen: whether the participant selected team 1 or team 2 to win

fTeamChosen: participant's forecast of the probability (*100) that their chosen team will win the game.

gTeamChosen: participant's guess of others' average forecast of the probability (*100) that their chosen team will win the game.

f: participant's forecast recast in terms of the probability (*100) that the team designated as Team 1 will win the game.

g: participant's guess of the average forecast of others recast in terms of the probability (*100) that the team designated as Team 1 will win the game.

SK1: How knowledgeable are you about college basketball? 1=lowest, 5=highest

SK2: During college basketball season, how many games do you typically watch per week on TV or in person? 1= 0, 2= 1 or 2, 3 = 3 or 4, 4 = 5 or 6, 5 = >6

SK3: During basketball season, how many days a week do you read newspaper and website stories about college basketball? 1=0, 2= 1 or 2, 3= 3 or 4, 4= 5 or 6, 5= every day

SK4: Did you fill out a bracket for the NCAA men's basketball tournament? 1=I already filled mine out, 2=I plan to fill one out, 3=I will not be filling one out this year, but I have in the past, 4=I will not be filling one out this year, and never have in past years

OK1: After committing how many fouls does a player foul out in NCAA basketball? 1=4, 2=5, 3=6, 4=7 (correct answer is "2" (5 fouls))

OK2: How many seconds are on an NCAA shot clock? 1=30, 2=35, 3=40, 4=45 (correct answer is "2" (35))

OK3: How many minutes are in a regulation NCAA basketball game? 1=30, 2=40, 3=50, 4=60 (correct answer is "2" (40))

OK4: Which of these occurs when a team commits 7 fouls in a half? 1=The opposing team gets to shoot a technical foul, 2=The opposing team goes into the "one and one", 3=The opposing team is awarded a jump ball, 4=The opposing team is given an extra point per basket (correct answer is "2")

comments: Any comments participant provided at the end of the study
